{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 正则化\n",
    "\n",
    "有很多方法来限制神经网络的容量（拟合复杂函数的能力）以防止过拟合。正则化就是一种很常见和很有效的方式。\n",
    "\n",
    "加上正则项的目标函数可以表示为：\n",
    "\n",
    "$$J(\\theta)+\\lambda R(W)$$\n",
    "\n",
    "那么，为什么正则化可以防止过拟合呢？\n",
    "首先，越复杂的模型，通常需要更大的W（W的长度）才能拟合。\n",
    "我们最小化目标函数的过程中，实际上也需要最小化正则项$\\lambda R(W)$，这也就是说，W的范数要尽可能小。这也就是说，我们需要尽可能多的W的值等于0。翻译过来，就是我们想要用尽量少的W去拟合我们的模型。\n",
    "\n",
    "这里的范数，是度量W的长度的量。常见的就是L0、L1、L2范数。\n",
    "\n",
    "L1范数：$||W||$\n",
    "\n",
    "L2范数：$W^2$\n",
    "\n",
    "![L1_and_L2](images/L1_L2_decent.PNG)\n",
    "\n",
    "常见的正则化手段包括：\n",
    "\n",
    "* L1正则化\n",
    "* L2正则化\n",
    "* Elastic net正则化(L1+L2)\n",
    "* Dropout\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## L2正则化\n",
    "\n",
    "L2正则化可能是最常见的正则化手段。L2正则化意思是，在目标函数的后面，加上一个正则项：$\\lambda W^2$。\n",
    "但是通常的情况是，加上$\\frac{1}{2}\\lambda W^2$，因为这样可以使得改正则向的导数为$\\lambda W$而不是$2\\lambda W$。\n",
    "\n",
    "L2正则化还有一个很有用的性质，它的梯度下降衰减是线性的： \n",
    "$$W += -\\lambda * W$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## L1正则化\n",
    "\n",
    "L1是另一个比较常见的正则化方式。L1有一个属性：在优化过程中，L1使得向量变得稀疏。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dropout\n",
    "\n",
    "![dropout_network](images/dropout_graph.PNG)\n",
    "![vanilla_dropout](images/dropout_vanilla.PNG)\n",
    "![inverted_dropout](images/dropout_inverted.PNG)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
